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Abstract: We show that the control of the false discovery rate (FDR) for 
a multiple testing procedure is implied by two coupled simple sufficient con- 
ditions. The first one, which we call "self-consistency condition" , concerns 
the algorithm itself, and the second, called "dependency control condition" 
is related to the dependency assumptions on the p- value family. Many stan- 
dard multiple testing procedures are self-consistent (e.g. step-up, step-down 
or step- up-down procedures), and we prove that the dependency control 
condition can be fulfilled when choosing correspondingly appropriate rejec- 
tion functions, in three classical types of dependency: independence, pos- 
itive dependency (PRDS) and unspecified dependency. As a consequence, 
we recover earlier results through simple and unifying proofs while extend- 
ing their scope to several regards: weighted FDR, p- value reweighting, new 
family of step-up procedures under unspecified p-value dependency and 
adaptive step-up procedures. We give additional examples of other possi- 
ble applications. This framework also allows for defining and studying FDR 
control for multiple testing procedures over a continuous, uncountable space 
of hypotheses. 
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1. Introduction 

A multiple testing procedure is defined as an algorithm taking in input some 
(randomly generated) data X 6 X and returning a set R{X) of rejected hy- 
potheses, which is a subset of the set Ti. of initial candidate null hypotheses. 
The false discovery rate (FDR) of the procedure is then defined as the expected 
proportion of null hypotheses in R(X) which are in fact true and thus incor- 
rectly rejected. Following its introduction by Bcnjamini and Hochbcrg (1995), 
the FDR criterion has emerged recently as a widely used standard for a majority 
of applications involving simultaneous testing of a large number of hypotheses. 
It is generally required that a multiple testing procedure R has its FDR bounded 
by a certain fixed in advance level a . 

Our main point in this work is to show that FDR control is implied by two 
simple conditions. The first one, which we call self- consistency condition, re- 
quires that any rejected hypothesis h G R(X) should have its p-value Ph(X) 
smaller than a threshold Ag(\R(X)\) which itself depends on the volume of re- 
jected hypothesis |-R(^0| , and on a fixed functional parameter j3 . The second 
one, called dependency control condition, requires that for each true null hypoth- 
esis h , the couple of real variables (U, V) = (ph, \R(X)\) satisfies the inequality 
(for any c > , and the same function (3 as in the first condition) : 

E 

The first condition only concerns how the data is processed to produce the 
decision, and is hence purely algorithmic. It can easily be checked for several 
classical multiple testing procedures, such as step-down, step-up or more gener- 
ally step-up-down procedures. In this condition, the function [3 controls how the 
threshold increases with respect to the volume of rejected hypotheses. In partic- 
ular, for step-wise procedures, f3 corresponds (up to proportionality constant) 
to the rejection function used to cut the curve of ordered p-values. The second 
condition, on the other hand, is essentially probabilistic in nature. More pre- 
cisely, we can show that (1) can be satisfied under relatively broad assumptions 
on the dependency of (U, V) . In turn, as will be shown in more detail in the 
paper, this implies that the second condition is largely independent of the exact 
procedure R , but rather is related to the dependency assumptions between the 
p- values. 

The two conditions are not independent of each other: they are coupled 
through the same functional parameter /3, appearing in (1) as well as in the 
definition of the threshold . The function (3, called shape function, is as- 
sumed to be nondecreasing but otherwise arbitrary; if there exists (3 such that 
the two corresponding conditions are satisfied, this entails FDR control. 

The main advantage of this approach when controlling the FDR is that it 
allows us to abstract the particulars of a specific multiple testing procedure, 
in order to concentrate on proving the bound (1). This results in short proofs 
which in particular do not resort explicitly to p-values reordering. 



1{U < c(3(V)} 
V 



< c. 



(1) 
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We then present different types of applications of the result. This approach 
is first used to show that several well-known results on FDR control (mainly 
concerning step-up or step-down procedures based on a linear rejection function) 
are recovered in a synthetic way (e.g., results of Bcnjamini and Hochbcrg, 1995, 
1997; Benjamini and Yekutieli, 2001; Sarkar, 2002; Genovesc et al., 200G). We 
also derive the following new results: 

• some classical results on step-up procedures are extended to weighted pro- 
cedures (weighted-FDR and/or p- value weighting), under independence or 
dependence of the p- values; 

• a new family of step-up procedures which control the FDR is presented, 
under unspecified dependencies between the p-values; 

• we present a simple, exemplary application of this approach to the problem 
of adaptive procedures, where an estimate of the proportion ttq of true null 
hypotheses in TL is included in the procedure with the aim of increasing 
power; 

• the case of a continuous space of hypotheses is briefly investigated (which 
can be relevant for instance when the underlying obervation is modelled 
as a stochastic process); 

• the results of Bcnjamini and Liu (1999a) and Romano and Shaikh (2006a) 
on a specific type of step-down procedure are extended to the cases of 
positive dependencies (under a PRDS-type condition) and unspecified de- 
pendencies. 

To put some perspective, let us emphasize here again that the conditions 
proposed here are only sufficient and certainly not necessary: naturally, there are 
many examples of multiple testing procedures that are known to have controlled 
FDR but do not satisfy the coupled conditions presented here (including some 
particular step-up and step-down procedures) . The message that we nevertheless 
want to convey is that these conditions are able to cover at once an interesting 
range of classical existing results on FDR control as well as provide a useful 
technical tool. It was pointed out to us that a result similar in spirit to ours 
will appear in the forthcoming paper by Finncr et al. (2008); this is discussed 
in more detail in Section 5.1. 

This paper is organized as follows: in Section 2, we introduce the framework, 
the two conditions and we prove that taken together, they imply FDR control. 
The self-consistency and dependency control conditions are then studied sepa- 
rately in Section 3, leading to specific assumptions, repectively, on the procedure 
itself (e.g. step-down, step-up) and on the dependency between the p- values (in- 
dependence, PRDS, unspecified dependencies). The applications summarized 
above are detailed in Section 4. Some technical proofs are postponed in the 
appendix. 
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2. Two sufficient conditions for FDR control 
2.1. Preliminaries and notations 

Let (X, X, P) be a probability space, with P belonging to a set or "model" 
of distributions, which can be parametric or non-parametric. Formally, a null 
hypothesis is a subset h c of distributions on [X, X) . We say that P satisfies 
h when P G h . 

In the multiple testing framework, one is interested in determining simultane- 
ously whether or not P satisfies distinct null hypotheses belonging to a certain 
set TL of candidate hypotheses. Below, we will always assume that TL is at most 
countable (except specifically in Section 4.4, where we mention extensions to 
continuous sets of hypotheses). We denote by TLo(P) = {h G TC\ P satisfies h} 
the set of null hypotheses satisfied by P, called the set of true null hypotheses. 
We denote by TL\(P) = TL \ TLq(P) the set of false null hypotheses for P . 

A multiple testing procedure returns a subset R(x) C TL of rejected hypothe- 
ses based on a realization x of a random variable X ~ P . 

Definition 2.1 (Multiple testing procedure). A multiple testing procedure R 
on TL is a function R : x G X ^ R(x) C TL , such that for any h G TL, the 
indicator function l{h G R(x)} is measurable. The hypotheses h G R are the 
rejected null hypotheses of the procedure R. 

We will only consider, as is usually the case, multiple testing procedures R 
which can be written as function i?(p) of a family of p- values p = (ph, h G TL) . 
For this, we must assume that for each null hypothesis h G TL, there exists a 
p-value function ph , defined as a measurable function p% : X — * [0,1], such 
that if h is true, the distribution of Ph{X) is stochastically lower bounded by a 
uniform random variable on [0, 1]: 



A type I error occurs when a true null hypothesis h is wrongly rejected i.e. 
when h G R(x) n TLo(P). There are several different ways to measure quantita- 
tively the collective type I error of a multiple testing procedure. In this paper, 
we will exclusively focus on the false discovery rate (FDR) criterion, introduced 

by Bcnjamini and Hochberg (1995) and which has since become a widely used 
standard. 

The FDR is defined as the averaged proportion of type I errors in the set of 
all the rejected hypotheses. This "error proportion" will be defined in terms of 
a volume ratio, and to this end we introduce A , a finite positive measure on TL . 
In the remainder of this paper we will assume such a volume measure has been 
fixed and denote, for any subset S C TL , 15*1 = A(5) . 

Definition 2.2 (False discovery rate). Let R be a multiple testing procedure 
on TL . The false discovery rate (FDR) is defined as 



vpg «p 



V/iG WoOP),V£G [0,1], P x ~p 



\p h (X)<t] <t. 



FDR(i?, P) := E x ~p 



- \R(X)nTL (P)\ 

. \R(X)\ 



1{\R(X)\>0} 



(2) 



G. Blanchard and E. Roquain/ Sufficient conditions for FDR control 



967 



Throughout this paper we will use the following notational convention: when- 
ever there is an indicator function inside an expectation, this has logical priority 
over any other factor appearing in the expectation. What we mean is that if other 
factors include expressions that may not be defined (such as the ratio outside 
of the set defined by the indicator, this is safely ignored. In other terms, any 
indicator function present implicitly entails that we perform integration over 
the corresponding set only. This results in more compact notation, such as in 
the above definition. 

For the sake of simplifying the exposition, we will (as is usually the accepted 
convention) most often drop in the notation a certain number of dependencies, 
such as writing R or ph instead of R(X), ph(X) and H.o,Hi, FDR(i?) instead of 
Ho(P), Hi{P), FDR(i?, P) . We will also omit the fact that the probabilities or 
expectations are performed with respect to X ~ P . Generally speaking, we will 
implicitly assume that P is fixed, but that all relevant assumptions and results 
should in fact hold for any P € . For example, our main goal will be to derive 
upper bounds on FDR(_R, P) valid for all P £ *P ; this will be formulated simply 
as a bound on FDR(i?) . 

Remark 2.3. (Role of A and weighted FDR in the finite case) When the space of 
hypotheses is finite, the "standard" FDR in multiple testing literature is the one 
defined using |.| equal to the counting measure (cardinality) on a finite space 
and will be referred to as "standard A weigthing" . The notation | . | was kept here 
to allow notation compatibility with this case and to alleviate some notational 
burden. We stress however that in the case Ti. is countably infinite, the volume 
measure A cannot be the cardinality, since we assume it to be finite. 

The possibility of using different weights A({h}) for particular hypotheses 
h leads to the so-called "weighted FDR". In general, the measure A repre- 
sents the relative importance, or criticality, of committing an error about dif- 
ferent hypotheses, and can be dictated by external constraints. As discussed 
by Bcnjamini and Hochberg (1997) and Benjamini and Heller (2007), control- 
ling the "weighted FDR" can be of interest in some specific applications. For 
instance, in the situation where each hypothesis concerns a whole cluster of 
voxels in a brain map, it can be relevant to increase the importance of large 
discovered clusters when counting the discoveries in the FDR. Note finally that 
A can be rescaled arbitrarily since only volume ratios matter in the FDR. 

2.2. Self- consistency, dependency control and the false discovery 
rate 

It is commonly the case that multiple testing procedures are defined as level 
sets of the p- values: 

R = {heH\ Ph < *}, (3) 

where t is a (possibly data-dependent) threshold. We will be more particularly 
interested in thresholds that specifically depend on a real parameter r and pos- 
sibly on the hypothesis h itself, as introduced in the next definition. 
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Definition 2.4 (Threshold collection). A threshold collection A is a function 

A : (h, r) e H x R+ i-> A(h, r) e R+, 

which is nondecreasing in its second variable. A factorized threshold collection 
is a threshold collection A with the particular form: V(h, r) € H X R + , 

A(/i,r) = air(h)/3(r), 

where 7r : 7i — > [0, 1] is called the weight function and /3 : R + — * R + is a 
nondecreasing function called the shape function. Given a threshold collection 
A, the A-thresholding-based multiple testing procedure at rejection volume r is 
defined as 

L A (r):={heH\ Ph <A(h,r)}. (4) 

Let us discuss the role of the parameter r and proceed to the first of the two 
announced sufficient conditions. Remember our goal is to upper bound FDR(i?) , 
where the volume of rejected hypotheses \R\ appears as the denominator in the 
expectation. Hence, intuitively, whenever this volume gets larger, we can globally 
allow more type I errors, and thus take a larger threshold for the p-values. 
Therefore, the rejection volume parameter r in the definition above should be 
picked as an (increasing) function of \R\ . Formally, this leads to the following 
"self-referring" property: 

Definition 2.5 (Self-consistency condition). Given a factorized threshold col- 
lection of the form A(h, r) = wr{h)(3{r) , a multiple testing procedure R satisfies 
the self-consistency condition with respect to the threshold collection A if the 
following inclusion holds a.s.: 

RcL A (\R\). (SC(a,7r,/3)) 

Next, we introduce the following probabilistic condition on two dependent 
real variables: 

Definition 2.6 (Dependency control condition). Let (3 : M + — * R + be a non- 
decreasing function. A couple (U, V) of (possibly dependent) nonnegative real 
random variables is said to satisfy the dependency control condition with shape 
function [3 if the following inequalities hold: 



Vc> 0, E 



1{U < cf3{V)} 



V 



<c. (DC(/3)) 



The following elementary but fundamental result is the main cornerstone 
linking the FDR control to conditions SC and DC. 

Proposition 2.7. Let [3 : R + — * R + be a (nondecreasing) shape function, 
7r : 7i —> [0,1] a weight function and a a positive number. Assume that the 
multiple testing procedure R is such that: 

(i) the self-consistency condition SC(q,7t, f3) is satisfied; 

(ii) for any h € Hq the couple {ph, \R\) satisfies DC(/3). 

Then FDR(i?) < an{H n ) , where dU = ndA , i.e., U{H ) := T,hen A ({ /l l) 7r ( /l ) • 
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Proof. From (2), 



FDR(i?) = E 



\Rnn \ 

\R\ 



1{\R\>Q} = J2 A (W) E 



heHo 



l{h e R} ~ 

\R\ . 

< aK(h)P(\R\)} 
\R\ 



< £ A({ft})E - 



he Wo 



where we have used successively conditions (i) and (ii) for the two above in- 



Let us point out the important difference in nature between the two sufficient 
conditions: for a fixed shape function /?, the self-consistency condition (i) con- 
cerns only the algorithm itself (and not the random structure of the problem). 
On the other hand, the dependency control condition (ii) seems to involve both 
the algorithm and the statistical nature of the problem. However, we will show 
below in Section 3.2 that this latter condition can be checked under a weak, gen- 
eral and quite natural assumption on the algorithm itself (namely that |i?(p)| 
is nonincreasing function of the p- values) , and primarily depends on the depen- 
dency structure of the p- values. (Moreover, in the case of arbitrary dependencies, 
we will consider a special family of /3s which satisfy the condition without any 
assumptions on the algorithm.) Hence, the interest of the above proposition is 
that it effectively separates the problem of FDR control into a purely algorith- 
mic and an (almost) purely probabilistic sufficient condition. The link between 
the two conditions is the common shape function (3 : the dependency assump- 
tions between the p- values will determine for which shape function the condition 
DC(/3) is valid; in turn, this will impose constraints on the algorithm through 
condition SC(a, tt, 0). 

Remark 2.8. (Role of 7r and p- value weighting in the finite case) To understand 
intuitively the role of the weight function tt , assume Ti is of finite cardinality m 
and take for simplification [3{r) = 1 for now. Consider the corresponding testing 
procedure L\: the rejected hypotheses are those for which p' h := ph/(mn(h)) < 
a/m , where p' h is the weighted p-value of h . If ir(h) is constant equal to 1/m , 
we have p' h = ph and the above is just Bonferroni's procedure, which has family- 
wise error rate (FWER) controlled by a . If ir(h) is, more generally, an arbitrary 
probability distribution on Ti , the above is a weighted Bonferroni's procedure 
and has also FWER less than a (see, e.g., Wasserman and Rocdcr, 2006). In 
this example, 7r represents the relative importance, or weight of evidence, that 
is given a priori to p- values, and thus plays the role of a prior that can be fixed 
arbitrarily by the user. Its role in the control of FDR is very similar; the use 
of weighted p-values for FDR control has been proposed earlier, for example by 
Gcnovese et al. (2006). When Ti is of finite cardinality m, we will refer to the 
choice ir(h) = 1/m , in conjunction with A being the cardinality measure, as the 
"standard A — tt weighting" . 



equalities. 



□ 
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More generally, following Proposition 2.7, control of the FDR at level a is 
ensured as soon as the weight function tt is chosen as a probability density with 
respect to A (i.e. J2heH MiMW 1 ) = !)■ When H is of finite cardinality m and 
with the "standard A— tt weighting" defined above, we obtain FDR < amo/m < 
a (where mo denotes the number of true null hypotheses). 

Remark 2.9. Proposition 2.7 can be readily extended to the case where we use 
different volume measures for the numerator and denominator of the FDR . 
However, since it is not clear to us whether such an extension would be of 
practical interest, we choose in this paper to deal only with a single volume 
measure. 

3. Study of the two sufficient conditions 

In this section, we give a closer look to conditions SC(«, tt, 0) and DC(/3), and 
study typical situations where they are statisfied. 

3.1. Self- consistency condition and step-up procedures 

The main examples of self-consistent procedures are step-up procedures. In fact, 
for a fixed choice of parameters [a, /?, tt) , step-up procedures output the largest 
set of rejected hypotheses such that SC(a,7r,/3) is satisfied, and are in this 
sense optimal with respect to that condition. Here, we define step-up procedures 
by this characterizing property, thus avoiding the usual definition using the 
reordering of the p- values. 

Definition 3.1 (Step-up procedure). Let A be a factorized threshold collection 
of the form A(/i, r) = aTr(h)[3(r) . The step-up multiple testing procedure R 
associated to A , is given by cither of the following equivalent definitions: 

(i) R = L A (f) , where f :— max{r > | ( r ) I > r } 
(ii) R = {j{A<zH\A satisfies SC(a, tt, (3) } . 

Additionally, r satisfies |La(^)| = r; equivalently, the step-up procedure R 
satisfies SC(a,7r,/3) with equality. 

Proof of the equivalence between (i) and (ii). Note that, since A is assumed to 
be nondecreasing in its second variable, L\(r) is a nondecreasing set as a func- 
tion of r > 0. Therefore, I-^aMI i s a nondecreasing function of r and the 
supremum appearing in (i) is a maximum i.e. 1-^(^)1 > f . It is easy to see that 
|£a(^)| = f because this would otherwise contradict the definition of r. Hence 
L^{r) = L&(\L&{r)\) , so L&(r) satisfies SC(a,7r,/3) (with equality) and is in- 
cluded in the set union appearing in (ii). Conversely, for any set A satisfying 
A c L A (\A\) , we have |La(|A|)| > , so that \A\ < f and A C L A (f) . □ 

When TL is finite of cardinal m endowed with the standard A-weighting 
A(-) = Card(-) , Definition 3.1 is equivalent to the classical definition of a 
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Step-down Step-up 




Fig 1. Pictorial representation of the step-up (and step-down) thresholds, and (in grey) of 
all thresholds r £ {1, ...,m} for which L/\(r) satisfies the self- consistency condition. The 
p-values and the rejection function represented here have been picked arbitrarily and in a 
deliberately exaggerated fashion in order to illustrate the different procedures; they are not 
meant to represent a realistic data or model. This picture corresponds to the standard A-tt 
weighting only. 



step-up procedure, based on reordering the p-values: for any h G H, denote 
by pi '■— Phi t(rmr(h)) the weighted p-value of h (in the case 7r(/i) = 0, we put 
p' h = +oo if p) t > and p' h = if ph = 0), and consider the ordered weighted 
p-values 

<p{ 2) <■■ <P( m )- 

Since La(t) = {h £ Tt \ p' h < a/3(r)/m}, the condition |La(0| > r is equivalent 
to p'( r j < a(3{r)/m. Hence, the step-up procedure associated to A defined in 
Definition 3.1 rejects all the f smallest weighted p-values, where f corresponds 
to the "last right crossing" point between the ordered weighted p-values p'^ and 
the scaled shape function a(3{-)/m: 

f = max{r € {0, . . . ,m} | p'/ r \ < a/3(r)/m}, 

with p'^ := 0; see Figure 1 for an illustration. For the standard 7r-weighting 
7r(/i) = 1/ro, the weighted p-values are simply the p-values. In particular: 

• The step-up procedure associated to the linear shape function [3(r) = r 
is the well-known linear step-up procedure of Bcnjamini and Hochbcrg 
(1995). 

• The step-up procedure associated to the linear shape function (3(r) = 
r (X)"=i 7) i s the distribution-free linear step-up procedure of Benjamini 
and Yekutieli (2001). 

Finally, let us point out that step-down and more generally step-up-down 
procedures are also self-consistent. The latter class of step- wise procedures have 
been introduced by Tamhanc ct al. (1998), and contains step- up and step-down 
procedures as particular cases. These procedures select in a certain way among 
the "crossing points" between the p-value function and some fixed rejection 



G. Blanchard and E. Roquain/ Sufficient conditions for FDR control 



972 



function (for example, on Figure 1, there are only two non-zero crossing points to 
choose from). More formally, and under arbitrary weighting, given a parameter 
A e [0, \H\\, the step-up-down procedure with threshold collection A and of 
order A is defined as LaIja), where either r\ :— max{r > A | Vr',A < r' < 
r , \LA(r')\ > r'} if |Xa(A)| > A; or r\ := max{r < A | |£aM| > r} otherwise. 
In words, assuming the standard weighting case and A an integer, if p^x) is 
smaller than the rejection function at A , the closest crossing point to the right 
of A is picked, otherwise the closest crossing point to the left. In particular, the 
step-up-down procedure of order A = \H\is simply the step- up procedure (based 
on the same threshold collection). The case A = is the step-down procedure. 
Although generalized step-up-down procedures are not maximal with respect to 
condition SC like the plain step-up, the fact that they still satisfy that condition 
is worth noticing. 

3.2. Dependency control condition 

In this section, we show that condition (ii) of Proposition 2.7 holds under differ- 
ent types of assumptions on the dependency of the p- values. We will follow the 
different types of dependencies considered by Bcnjamini and Yekutieli (2001), 
namely independent, positive dependency under the PRDS condition and arbi- 
trarily dependent p- values. In each case, we have to prove DC(/3) for specific 
conditions on the variables (U, V) , resulting in specific choices for the shape 
function f3 . 

We start the section with a probabilistic lemma collecting the technical tools 
used to deal with each situation. 

Lemma 3.2. Let (U, V) be a couple of nonnegative random variables such 
that U is stochastically lower bounded by a uniform variable on [0,1], i.e. 
yt G [0,1], P({7 < t) < t . Then the dependency control condition DC(/3) is 
satisfied by (U, V) under any of the following situations: 

(i) /3(x) — x and V — g(U) , where g : K + — * R + is a nonincreasing function. 

(ii) /3(x) = x and the conditional distribution ofV given U < u is stochastically 
decreasing in u, that is, 

for any r > , the function u i— > P(V^ < r | U < it) is nondecreasing . (5) 
(Hi) The shape function is of the form 



where v is an arbitrary probability distribution on (0, oo) , and V is arbitrary. 

The proof is found in appendix. Note that there is some redundancy in the 
lemma since (i) is a particular case of (ii), but this subcase has a particularly 




(6) 
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simple proof and is of self interest because it corresponds to the case of inde- 
pendent p- values (as will be detailed below). 

We now apply this result to prove that for any h £ Ho, the couple of variables 
(ph, \R\) satisfies DC(/3), under the different dependency assumptions on the p- 
values, and for the correspondingly appropriate functions (3 given by the lemma. 
The only additional assumption we will make on the procedure R itself is that 
it has nonincreasing volume as a function of the p-values (and this assumption 
will not be required in the case of arbitrarily dependent p- values). 

3.2.1. Independent case 

Proposition 3.3. Assume that the collection of p-values p = (ph,h £ H) 
forms an independent family of random variables. Let R(p) be a multiple testing 
procedure such that |i?(p)| is nonincreasing in each p-value ph such that h £ Ho . 
For any h £H, denote P-h the collection of p-values (jp g : g £ H, g ^ h) . 

Then for any h £ Ho and for the linear shape function /3(x) — x , the couple 
of variables {ph, \R\) satisfies DC(/3), in which the expectation is taken con- 
ditionally to the p-values of p~h- As a consequence, it also satisfies DC(/3) 
unconditionally. 

Proof. By the independence assumption, the distribution of U = ph condi- 
tionally to p_/j is identical to its marginal and therefore stochastically lower 
bounded by a uniform distribution. The value of p h being held fixed, |J2(p)| = 
\R((P-h,Ph))\ can be written as a nonincreasing function g of ph by the assump- 
tion on R. We conclude by part (i) of Lemma 3.2. □ 

Remark 3.4. Note that Proposition 3.3 is still valid under the slightly weaker 
assumption that for all h £ Ho, Ph is independent of the family (p g ,g ^ h) (in 
particular, the p-values of (ph, h £ Hi) need not be mutually independent). 

3.2.2. Positive dependencies (PRDS) 

From point (ii) of Lemma 3.2, each couple {pt, \R\) satisfies DC(/3) with (3(x) = 
x under the following condition (weaker than independence): 

for any r > , the function u t— > P(|i?| < r \ pu < u) is nondecreasing . (7) 

Following Bcnjamini and Yckutieli (2001), we state a dependency condition 
ensuring that (p/,, \R\) satisfies (7). For this, we recall the definition of posi- 
tive regression dependency on each one from a subset (PRDS) (introduced by 
Bcnjamini and Yckutieli, 2001, where its relationship to other notions of posi- 
tive dependency is also discussed). Remember that a subset D C [0, l] n is called 
nondecreasing if for all z, z' £ [0, l] H such that z < z' (i.e. V/i £ H,Zh < z' h ), we 
have z £ D =>• z' £ D . 

Definition 3.5. For H' a subset of H, the p-values of p = (ph,h £ H) are 
said to be positively regressively dependent on each one from H' (denoted in 
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short by PRDS onH'), if for any h £ H' , for any measurable nondecreasing set 
D C [0, l] H , the function u i— ► P(p € D \ ph = u) is nondecreasing. 

We can now state the following proposition: 

Proposition 3.6. Suppose that the p-values of p = (ph,h 6 Ti) are PRDS 
on H.q , and consider a multiple testing procedure R such that |-R(p)| is nonin- 
creasing in each p-value. Then for any h G Ho , the couple of variables (ph, \R\) 
satisfies DC(/3) for the linear shape function (3{x) = x . 

Proof. We merely check that condition (7) is satisfied. For any fixed r > , 
put D = {z e [0, l] n | |i?(z)| < r} . It is clear from the assumptions on R that 
D is a nondecreasing measurable set. Then by elementary considerations, the 
PRDS condition (applied using the set D defined above) implies (7). The latter 
argument was also used by Bcnjamini and Yekuticli (2001) with a reference to 
Lchmann (1966). We provide here a succinct proof of this fact in the interest of 
remaining self-contained. 

Under the PRDS condition, for all u < u' , putting 7 = P [ph < u \ ph < u'] , 

P [p e D I p h < u'] = E [P [p € D I p h ] I Ph < u'} 
= 7 E [P [p e D I p h ] I Ph < u] 

+ (1 - 7 )E [P [p G D I p h ] \ u< Ph < u'] 
> E [P [p G D I p h ] I Ph < u] = P [p G D I p h < u] , 

where we have used the definition of PRDS for the last inequality. □ 
3.2.3. Unspecified dependencies 

We now consider a totally generic setting with no assumption on the dependency 
structure between the p-values nor on the structure of the multiple testing pro- 
cedure R . Using point (iii) of Lemma 3.2, we obtain immediately the following 
result: 

Proposition 3.7. Let f3 v be a shape function of the form (6). Then for any 
h G 7io , the couple of variables (ph, \R\) satisfies DC(/3), for any multiple testing 
procedure R . 

Note that a shape function of the form (6) must satisfy fi v (r) < r , with 
strict inequality except for at most one point beside zero (some examples will 
be discussed below in Section 4.2). Therefore, the price to pay here is a more con- 
servative dependency control inequality, in turn resulting in a more restrictive 
self-consistency condition when using this shape function. This form of shape 
function was initially introduced by Blanchard and Fleuret (2007), where some 
ties were exposed between multiple testing and statistical learning theory. 
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4. Applications 

4- 1. The linear step-up procedure with A — tt weighting 

We have seen earlier in Section 3.1 that step- up procedures satisfy the self- 
consistency condition. Furthermore, is is easy to see that step-up procedures 
are nonincreasing as a function of the p-values. Using this in conjunction with 
Proposition 3.3 (resp. Proposition 3.6) and Proposition 2.7, we obtain the follow- 
ing result for the (A-weighted) FDR control of the (7r-weighted) linear step-up 
procedure - that is, the step-up procedure associated to the threshold collection 
A(/i, r) = cm{K)r . 

Theorem 4.1. For any finite and positive volume measure A, the (it-weighted) 
linear step-up procedure R has its (A-weighted) FDR upper bounded by H(Tio)a , 
where n(7io) := ^2heHo •^■({' l }) 7r (^)> * n e ^ ner of the following cases: 

• the p-values ofp — (p/, 5 h S Ti) are independent. 

• the p-values ofp — (p^, h € Ti) are PRDS on TLq. 

Again, the statement is redundant since independence is a particular case of 
PRDS, and we just wanted to recall that the treatment of the independent case 
is particularly simple. This theorem essentially recovers and unifies some known 
results concerning particular cases: the two points of the theorem were respec- 
tively proved by Bcnjamini and Hochberg (1995) and Bcnjamini and Yekutieli 
(2001), with a uniform n, and A the cardinality measure. For a general volume 
measure A and a uniform tt, the above result in the independent case was proved 
by Bcnjamini and Hochberg (1997). A proof with a general tt, A the cardinality 
measure and in the independent case was investigated by Gcnovese ct al. (2006). 

The interest of the present framework is to allow for a general and unified 
version of these results with a concise proof (avoiding in particular to consider 
explicitly p- value reordering). We distinguish clearly between the two different 
ways to obtain "weighted" versions of step- up procedures, by changing respec- 
tively the choice of the volume measure A or the weight function tt. Both types of 
weighting are of interest and of different nature; using weighted p-values can have 
a large impact on power (Gcnovese et al., 2006; Roquain and van de Wicl, 2008; 
see also above Remark 2.8), while using a volume A different from the cardinal- 
ity measure can be of relevance for some application cases (see Benjamini and 
Hochberg, 1997; Bcnjamini and Heller, 2007; and Remark 2.3 above). Up to our 
knowledge, the two types of weighting had not been considered simultancouly 
before; in particular and as noticed earlier (see Remark 2.8), in order to ensure 
FDR control at level a under an arbitrary volume measure A , the appropriate 
choice for a weight function tt is to take a density function with respect to A . 

4-2. An extended family of step-up procedures under unspecified 
dependencies 

Similarly, in the case where the p-values have unspecified dependencies, we use 
Proposition 3.7 instead of Proposition 3.6 to derive the following theorem: 
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Theorem 4.2. Consider R the step-up procedure associated to the factorized 
threshold collection A(h,r) = onr(h)/3 u (r), where the shape function j3 v can be 
written in the form (6). Then R has its (K-weighted) FDR controlled at level 
U(H )a. 

Theorem 4.2 can be seen as an extension to the FDR of a celebrated inequality 
due to Hommcl (1983) for the family-wise error rate (FWER), which has been 
widely used in the multiple testing literature (see, e.g., Lehmann and Romano, 
2005; Romano and Shaikh, 2006a,b). Namely, when v has its support in {1, ... , to} 
and H = Ho , the above result recovers Hommcl's inequality Note that the lat- 
ter special case corresponds to a "weak control" , where we assume that all null 
hypotheses are true; in this situation the FDR is equal to the FWER. Note also 
that Theorem 4.2 generalizes without modification to a possibly continuous hy- 
pothesis space, as will be mentioned in Section 4.4. The result of Theorem 4.2 
initially appeared in a paper of Blanchard and Fleuret (2007), in a somewhat 
different setting. 

4-2.1. Discussion of the family of new shape functions 

Theorem 4.2 establishes that, under arbitrary dependencies between the p- 
values, there exists a family of step-up procedures with controlled false discovery 
rate. This family is parametrized by the free choice of a distribution v on the 
positive real line, which determines the shape function (3 V . 

In the remaining of Section 4.2, we assume H. to be finite of cardinal m , 
endowed with the standard A weighting, i.e., the counting measure. In this 
situation, let us first remark that it is always preferable to choose v with support 
in {1, . . . , m} . To see this, notice that only the values of (5 at integer values 
fc, 1 < fc < to matter for the output of the algorithm. Replacing an arbitrary 
distribution v by the discretized distribution v'({k}) — v((k — l,k\) for fc < to 
and v'({m}) = v((m — l,+oo)) results in a shape function /?' which is larger 
than (3 on the relevant integer range, hence the associated step-up procedure 
is more powerful. This discretization operation will however generally result in 
minute improvements only; sometimes continuous distributions can be easier to 
handle and avoid cumbersomeness in theoretical considerations. 

Here are some simple possible choices for (discrete) v based on power func- 
tions u({k}) oc fc 7 , 7 € {-1, 0, 1} : 

• is({k}) = 7" fc -1 for fc € {1, . . . , to} with the normalization constant j m = 
J2i<i< m \- Th^ 13 yields (3{r) — 7 I ^ 1 r , and we recover the distribution-free 
procedure of Benjamini and Yekutieli (2001). 

• v is the uniform on {1, ... , to}, giving rise to the quadratic shape function 
j3(r) = r{r + 1)/2to. The obtained step-up procedure was proposed by 
Sarkar (2008). 

• v({k}) = 2fc/(m(TO+l)) for fce {1,...,to} leads to f3(r) =r(r + l)(2r + 
1)/(3to(to + 1)) . 
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Fig 2. For the standard A-weighting and m = 1000 hypotheses, this figure shows several 
(normalized) shape functions m~ 1 (3 associated to different distributions v on R + (accord- 
ing to expression (6)): Dirac distribution: u = da, with fj, > 0. (Truncated-) Gaussian 
distribution: v is the distribution of max(X, 1), where X ~ J\f(fi,a 2 ) . Power distribu- 
tion: dv(r) = r 7 l{r £ [1, m]}dr/ J™ u^du, 7 £ R. (Truncated-) Exponential distribution: 
du(r) = (1/A) cxp(— r/X)l{r £ [0,m]}dr, with A > 0. On each graph, for comparison pur- 
poses we added the threshold function for Holm's step-down m _1 /3(rr) = l/(m — x + 1) , 
(small dots), and the linear thresholds f3(x) = x (large dots) and /3(x) = (yV^^ i~ 1 )~ 1 x 
(solid - also corresponding to the power distribution with 7 = —1), corresponding to the 
standard linear step-up and to the distribution-free linear step-up of Benjamini and Yekutieli 
( 2001 ), respectively. 

On Figure 2, we plotted the shape functions corresponding to different choices 
of distributions v (which are actually continuous, i.e., without applying the 
discretization procedure mentioned above). It is clear that the choice of v has 
a large impact on the final number of rejections of the procedure. However, 
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since no shape function uniformly dominates the others, there is no universally 
optimal choice of v: the respective performances of these different procedures 
will depend on the exact distribution P , and in particular on the number of 
non-true hypotheses. 

We like to think of v as a kind of "prior" on the possible volumes of rejected 
hypotheses. If we expect to have only a few rejected hypotheses, v should be 
concentrated on small values, and more spread out if we expect a significant 
rejection proportion. This intuition is in accordance with a case of equality in 
Hommel's inequality established by Lchmann and Romano (2005, Lemma 3.1 
(ii) ) . In the situation studied there (a specifically crafted distribution P ) , it 
can be checked that the distribution of the cardinality of the step-up procedure 
R using the shape function /3„ , conditionally to R ^ , is precisely v in our 
notation, while FDR(i?) is exactly a . 

As mentioned previously in Section 3.2.3, for any choice of v , the shape func- 
tion j3 u is always upper bounded by the linear shape function (3{x) = x . The 
only cases of equality are attained if v is equal to a Dirac measure 6 Xo in a point 
.To € {1, . . . , m} : in this case (xq) — xq but (3s XQ (x) < x for any x ^ xq . 
Therefore, these procedures always reject less (or at most as many) hypotheses 
than the linear step-up. Admittedly, this probably limits the practical impli- 
cations of this result, as we expect practitioners to prefer using the standard 
linear step-up even if the theoretical conditions for its validity cannot be for- 
mally checked in general. Additional conservativeness is the "price to pay" for 
validity under arbitrary dependencies, although the above result shows that one 
has, so to say, the choice in the way this price is to be paid. 

Finally, from the examples of shape functions drawn on Figure 2, the shape 
functions based on exponential distributions v seem particularly interesting; 
they appear to exhibit a qualitatively diverse range of possible shape functions, 
offering more flexibility than the Benjamini-Yekutieli procedure while not being 
as committed as the Dirac distributions to a specific prior belief on the number 
of rejected hypotheses. 

Jf.,2.2. Comparison to Bonferroni's and Holm's procedures 

Observe that Bonferroni's procedure also belongs to the family presented here 
(taking v = 5%) - in the sense that a single-step procedure using a fixed threshold 
can be technically considered as a step- up procedure. It is well-known, however, 
that its control on type I error is much stronger than bounded FDR, namely 
bounded FWER. To this extent, it is worth considering the question of whether 
other rejections functions in the family - for which only the FDR is controlled 
- are of interest at all As remarked earlier, no shape function in the family 
can uniformly dominate the others, and consequently there exist particular sit- 
uations where Bonferroni's procedure can be more powerful (i.e. reject more 
hypotheses) than other members of the family. However, this case appears only 
when there is indeed a very small number of rejections (i.e., when the signal 
is extremely "sparse"). For instance, comparing the three examples mentioned 
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above to Bonferroni asymptotically as m — > oo, we see that the corresponding 
step-up procedures have a rejection function larger than Bonfcrroni's threshold 
- and are therefore a posteriori more powerful than Bonferroni — provided 
their number of rejections \R\ is larger than: 

• 0(logm) for v(k) oc A; -1 (Benjamini-Yekutieli procedure); 

• Q(y/m) for v uniform; 



(Recalling here that 8() means asymptotic order of magnitude, in other terms 
"asymptotically lower and upper bounded, up to a constant factor".) In each 
of the above cases, the largest proportion u m = \R\ jm of rejections for which 
Bonfcrroni's procedure would a posteriori have been more powerful tends to 
zero as m — ► oo . An identical conclusion will hold if we compare these rejection 
functions to that of Holm's step-down (Holm, 1979), since the latter is equivalent 
to Bonferroni when u m — > (in addition, Holm's procedure is step-down while 
the above procedures are step-up). 

More generally, let us exhibit a generic family of shape functions f3 such that 
u m tends to zero as m — > oo. We first define the proportion u m for a given 
shape function (3 more formally, as u m = r m /m, where r m is the first point of 
{1, . . . , m} for which /?(■) is above 1 (Bonferroni's shape function). Introduce the 
family of scale invariant shape functions f3, that is, the /3sjthat can be rewritten 
under the form (3(r) = m/3(^) for some fixed function f3(u) — J? vdv{v) and 
fixed probability measure v on (0, 1]. In the latter, v should be taken indepen- 
dently of m as a "prior" on the proportion of rejections. (Equivalcntly, v takes 
the role of v if we consider the following alternate scaling of the standard A-7r 
weighting: A is the uniform probability measure on TL and 7r = 1.) It is then 
straightforward to check that u m tends to as m — > oo if we choose v such 
that f}{u) > for all u > (i.e. the origin is an accumulation point of the 
support of v) . This gives many examples of shape functions which outperform 
Bonferroni's and Holm's procedures as m grows to infinity in the "non-sparse" 
case. For example, the "power function" choice dv(u) = l{u S [0, l}}(-f + l)u J dx 
for 7 > — 1 gives rise to the rescaled shape function (3{u) = ^^u 7+2 and thus 

/5( r ) = m-y+i ■ I n ^ ne ca ses 7 = 0, 1, note that the latter corresponds to the 
functions (3 considered earlier (up to discretization). 

By contrast, one can easily check that there is no scale-invariant linear rejec- 
tion function satisfying (6): the Benjamini-Yekutieli procedure would correspond 
(up to lower order terms introduced by discretization) to the "truncated" prior 
dv{u) = (logTO) _1 l{m _1 < u < l}x~ 1 du , which cannot be extended to the ori- 
gin independently of m since u i— > u~ l is not integrable in 0. We have seen above 
that u m — > nevertheless also holds for this procedure: hence scale-invariant 
shape functions are certainly not the only candidates in the family to asymptot- 
ically outperform Bonferroni's and Holm's procedures in the "non-sparse" case. 

For comparison w.r.t. several other possible choices of v , (and for a finite 
m = 1000) we have systematically added Holm's rejection function on the plots 
of Figure 2. This leads to a qualitatively similar conclusion. 
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4-3. Adaptive step-up procedures 

We now give a very simple application of our results in the framework of adaptive 
step-up procedures. Observe that the FDR control obtained for classical step-up 
procedures is in fact not at the target level a , but rather at the level ir^a , where 
7r = H(Ho) is the "weighted volume" of the set of true null hypotheses (equal 
to the proportion of true null hypotheses too/™ m the standard case). This 
motivates the idea of first estimating tt^ 1 from the data using some estimator 
G(p) , then applying the step-up procedure with the modified shape function 
(3 — G(p)P . Because this function is now data-dependent, establishing FDR 
control for the resulting procedure is more delicate; it is the subject of numerous 
recent works (see, e.g., Black, 2004; Bcnjamini ct al., 2006; Finner et al., 2008; 
see also Gavrilov et al., 2008 for an adaptive step-down procedure). 

In this context we prove the following simple result, which is valid under the 
different types of dependency conditions: 

Lemma 4.3. Assume either of the following conditions is satisfied: 

• the p-values (ph, h S H) are PRDS on TIq , (3 is the identity function. 

• the p-values have unspecified dependencies and (3 is a function of the 
form (6). 

Define R as an adaptive step-up procedure using the data- dependent threshold 
collection A(h,r, p) = a\iT{li)G{p)[3{r) , where G(p) is some estimator of -Kq 1 , 
assumed to be nondecreasing as a function of the p-values. Then the following 
inequality holds: 



Proof. Consider R the modified step-up procedure using the data-dependent 
threshold collection ai7r(/i) max(7r _1 , G(p))/3(r) . Then it is easy to check that 
R satisfies the self-consistency condition SC(ai7r 7 1 , ir, (3). Furthermore, R is a 
nondecreasing set as a function of the p-values, by the hypothesis on G . There- 
fore, by combining Proposition 2.7 with Proposition 3.6 (resp. Proposition 3.7), 
R has its FDR controlled at level ttq {cx\ir^ ^ ) = ct\ in both dependency situations 
and we have 



FDR(i?) < ax + E 



|flntto| 
\R\ 



1{G( P ) > 7T- 1 } 



(8) 



FDR(i?) = E 



< E 
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\R\ 



l{|i?|>0} 



l{|i?|>0} +E 



Ifln^ol 

\R\ 
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< ai +E 



\_RrrHo[ 
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1{G( P ) > T^ 1 } 



□ 
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Incidentally, the above proof illustrates a technical use of the main result 
where the inclusion in the self-consistency condition is generally not an equality. 

We can apply Lemma 4.3 when considering a so-called two-stage procedure, 
where ttq is estimated using a preliminary multiple testing procedure Rq . We 
assume here that this first stage has controlled FWER (e.g. Holm's step-down). 

Corollary 4.4. Let Ro be a multiple testing procedure with FWER{Rq) := 
P(7io H Rq 7^ 0) controlled at level ao . Estimate itq by ttq = I±((i?o) c ) = 
^h^Ro 7r C l )-^({' 1 }) ^ e ^-volume of hypotheses non rejected by the first stage, 
and put G(p) = n^ 1 (defined as +oo when wo = 0). 

Then the adaptive step-up procedure R using the data- dependent threshold 
collection A(h,r,p) — aiir(h)G(p)f3(r) satisfies 

FDR(i?) < a + ai . 

The proof is a direct application of Lemma 4.3: the second term in (8) is 
upper bounded by P(G(p) > tTq 1 ) = P(n((i? ) c ) < U(Ho)), which is itself 
smaller than or equal to P(H n R i= 0), the FWER of the first stage. Note 
that in the standard situation where A = |.| is the counting measure and 7r is 
uniform, the above estimator of n^ 1 = m/m is simply m/rh , where is the 
number of non rejected hypotheses by the first stage. 

Because of the loss in the level introduced by the first stage, the latter result 
is admittedly not extremely sharp: for example, putting ao = ct\ = a/2, a 
theoretical improvement over the non-adaptive version at level a is obtained 
only when more than 50% of hypotheses are rejected in the first stage. However, 
while sharper results are available under the assumption of independent p- values 
(see, e.g., Benjamini ct al., 200G), up to our knowledge, there are almost no 
results addressing the case of dependent p- values (as is the case in the above 
result). The results we know of for this case are found in works of Sarkar (2008) 
and Farcomeni (2007). The latter reference establishes a result similar to the 
above one, but seems to make the implicit assumption that the two stages are 
independent, which we are not assuming here. A more extensive treatment of the 
question of adaptive procedures when following the general principles exposed 
in the present work, including other applications of Lemma 4.3, is proposed 
by Blanchard and Roquain (2008a) (see also the second author's PhD thesis, 
Roquain, 2007, Chap. 11). 

4-4- FDR control over a continuous space of hypotheses 

An interesting feature of the approach advocated here for proving FDR control 
is that it can be readily adapted to the case where Ti is a continuous set of 
hypotheses. A simple example where this situation arises theoretically is when 
when the underlying observation is modelled as a random process W over a 
continuous space T , and the goal is to test for each t G T whether E [H^t)] = . 
In this case we can identify TL to T . Such a setting was considered for example 
by Pcrone Pacifico et al. (2004). 
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In order to avoid straying too far from our main message in the present work, 
it was decided to postpone the detailed exposition of this point to a separate 
note. We refer the interested reader to the Section 5 of the technical report of 
Blanchard and Roquain (2008b), and restrict ourselves here to a brief overview. 
First, under appropriate (and tame) measurability assumptions, the framework 
developed in this paper carries over without change: in the FDR definition, 
instead of using the cardinality measure (which is of course not adapted in the 
continuous case), we are able to deal with an arbitrary "volume measure" A on 
7i (such as the Lebesgue measure if TL is a compact subset of R d ). Also, while 
it seems considerably more difficult to define rigorously step-up procedures in 
the traditional sense via reordering of the p- values, Definition 3.1 of a step-up 
procedure carries over in a continuous setting. 

Secondly, our main tool, Proposition 2.7, remains true when TL is continuous, 
by replacing each sum over TL by the corresponding integral (with respect to 
the measure A). Thirdly comes the question of how to adapt the three types of 
dependency conditions considered in Section 3.2 to a continuous setting. Under 
unspecified dependencies, there is nothing to change as our arguments are not 
specific to the discrete setting. The independent case, on the other hand, can- 
not be adapted to the continuous setting as it conflicts with some measurability 
assumptions. However, this setting is mainly irrelevant in a continuous setting 
as continuous families of independent random variables are not usually consid- 
ered. Finally, in the case of positive dependencies, condition (7) still ensures 
the dependency control condition since Lemma 3.2 is valid for arbitrary vari- 
ables, not necessarily discrete. The main difficulty is therefore to suitably adapt 
the PRDS assumption in the continuous setting. We propose two extensions of 
the PRDS condition, namely the "strong continuous PRDS" , which is a direct 
adaptation of the finite PRDS definition to a continuous setting, and the "weak 
continuous PRDS" , which states that any finite subfamily of p- values should be 
(finite) PRDS. The strong continuous PRDS condition is sufficient but arguably 
possibly not easy to check, while the weak PRDS condition is easier but requires 
some additional requirements on the procedure R to ensure condition DC. An 
example of a process satisfying either type of condition is a continuous Gaussian 
process with a positive covariance operator. 

4-5. Other types of procedures 

We want to point out that the approach advocated here also provides FDR 
control for procedures more general than step-up. For example, as mentioned 
at the end of Section 3.1, generalized step-up-down procedures satisfy a self- 
consistency property. Therefore, combining Proposition 2.7 with Proposition 3.6 
(PRDS case) and Proposition 3.7 (unspecified dependencies), we obtain the 
following result: 

Theorem 4.5. Assume either of the following conditions is satisfied: 

• the p-values (ph, h G TL) are PRDS on TLo , (3 is the identity function. 

• the p-values have unspecified dependencies and (3 is of the form (6) . 
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Then the generalized step-up-down procedure of any order A G [0, \Tl\] and asso- 
ciated to the threshold collection A(/i, r) = a-K{h)[3{r) has its FDR controlled at 
level aII(7io) • 

In the PRDS case and with the standard T - 7r weighting, the first point of 
the above result has been first proved by Sarkar (2002) (see also Finner et al., 
2008, where an approach related to ours is used to prove the same result; this 
is discussed in more detail below in Section 5.1). The contribution of the above 
result is to deal with possible T - tt weighting and with the general dependent 
case (in particular, note that this theorem contains both Theorem 4.1 and The- 
orem 4.2). We emphasize that the latter result does not come trivially from the 
fact that a step-up-down procedure is always a subset of the step-up procedure 
using the same threshold collection, because in the FDR expression the numera- 
tor and the denominator inside the expectation both decrease with the rejection 
set size. 

It could however legitimately be objected that only step-up procedures are 
really of interest in the present context, since they are less conservative than 
step-up-down, and even the less conservative possible under the SC condition, 
as argued in Section 3.1. But one interest of the self-consistency condition is to 
allow more flexibility, in particular if there arc additional constraints to be taken 
into account. Consider the following plausible scenario: in a medical imaging 
context, the user wants to enforce additional geometrical constraints on the set 
R of rejected hypotheses, represented as a 2D set of pixels. For example, one 
could demand that R be convex or have only a limited number of connected 
components. If such additional constraints come into play, the step-up may not 
be admissible, and has to be replaced by a subset satisfying the constraints. In 
this case, the flexibility introduced by the SC condition will be useful in order 
to give a simple criterion sufficient to establish FDR control without necessarily 
having to engineer a new proof for each new specific algorithm. Note in particular 
that in such a scenario, one would probably like to choose a maximal rejection 
set satisfying both the geometric constraints and self-consistency condition; in 
this case the resulting procedure cannot be characterized in general as a step- 
up-down procedure, and the SC condition might hold without equality, i.e. 
RCL A (\R\). 

4-6. Another application of condition DC(/3) 

In this section, we step outside of the framework used in Proposition 2.7; more 
precisely, we present another application of condition DC(/3) to study the FDR 
of a step-down procedure that does not satisfy the self-consistency condition 
with respect to the adequate shape function. We will prove that the step-down 
procedure proposed by Bcnjamini and Liu (1999a) and Romano and Shaikh 
(2006a) has a controlled FDR under a PRDS-type assumption of Tto on Tti; 
we also deduce a straightforward generalization to the unspecified dependencies 
case. In this section, we only consider A equal to the counting measure, so that 
the aim is to control the standard FDR. 
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Bcnjamini and Liu (1999a) and Romano and Shaikh (2006a) introduced the 
step-down procedure based on the threshold collection A(i) — j— 1™+ Y y± , showed 
that it has controlled FDR at level a if for each ho £ Ho , ph is independent of 
the collection of p- values (ph, h £ Hi) (in fact Romano and Shaikh, 2006a used 
a slightly weaker assumption, but it reduces to independence when the p- values 
of true null hypotheses are uniform on [0, 1]). Here, we prove this result under 
a weaker assumption, namely a positive regression depency assumption of p- 
valucs of Hi from those of Ho . Let us reformulate slightly the notion of "PRDS 
on Ho" given in Definition 3.5. We say that the p-values of (ph,h £ Hi) are 
positively regression dependent from each one in a separate set Ho (for short: 
Hi PRDSS on Ho) if for any measurable nondecreasing set D c [0, l] Hl and for 
all ho £ Ho, the function 

u i ^ P((ph)heHi € D | p ha = u) 

is nondecreasing. Note that the latter condition is obviously satisfied when for 
all ho £ Ho , Ph is independent of {ph, h £ Hi) . We chose to introduce a new 
acronym only to emphasize the fact that, contrarily to the standard PRDS , this 
assumption does not put constraints on the inner dependency structure of the 
p- value vector of true hypotheses. 

Theorem 4.6. Suppose that the p-values of Hi are PRDSS on Ho- Then the 
step-down procedure of threshold collection A(i) = r^z^pfp has a FDR less 
than or equal to a . 

If (3 is a shape function of the form (6), then without any assumptions on 
the dependency of the p-values, the step-down procedure of threshold collection 

A(i) = — am , , 3 ( — 3_r ] has a FDR less than or equal to a . 
v ' m— M-l^ V ra— i+l / 

The proof is found in appendix. Essentially, we followed the proof of Bcn- 
jamini and Liu (1999a) and identified the point where the condition DC(/3) 
(along with the results of Lemma 3.2) can be used instead of their argument. 

Bcnjamini and Liu (1999b) proposed a slightly less conservative step-down 
procedure: the step-down procedure with the threshold collection A(i) = 1— [l — 

min (l, m °^\_ i )] 1 It was proved by Bcnjamini and Liu (1999b) that this 

procedure controls the FDR at level a as soon as the p-values are independent. 
More recently, a proof of this result was given by Sarkar (2002) when the p- 
values are MTP 2 (see the definition there) and if the p-values corresponding to 
true null hypotheses are exchangeable. However, the latter conditions are more 
restrictive than the PRDSS assumption of Theorem 4.6. 

The procedure of Theorem 4.6 is often more conservative than the LSU pro- 
cedure. First because the LSU procedure is a step-up procedure, and secondly 
because the threshold collection of the LSU procedure is larger on a substantial 
range. However, in some specific cases (to small and large number of rejections), 
the threshold collection of Theorem 4.6 can be larger than the one of the LSU 
procedure. A similar argument can be made when comparing the proposed mod- 
ified step-down under unspecified dependencies to (for example) the modified 
LSU procedure of Bcnjamini and Yekutieli (2001). 
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In order to use Theorem 4.6 in the unspecified dependencies case, we have to 
choose a "prior" v on the set \ \ : 1 < k < to} : 

• taking a uniform v yields A(i) = a — K-rrr ( — ^-^ +••• + — ] , 

• taking z/ (-^) cx A: results in the threshold function A(j) = ( m -l+i) > 

• taking v (ij cx r results in A(i) equal to 

lm a m ™ + i (( m _i+l)2 + ^ ^2) — 7™^ ( m _J+i)2 , with 7m = J2i<m J • 

5. Discussion and conclusion 

5.1. The self-consistency condition and connection with other works 

The self-consistency condition with a linear shape function can be related to 
the following heuristic motivation: consider the problem of choosing a threshold 
for rejected p-values, which we reformulate equivalently as choosing r such that 
La(t) has a FDR smaller than a (for the linear threshold collection A(/i, r) = 
ar/m). If the final number of rejections |La(?")| was equal to a deterministic 
constant C(r), we would have a FDR bounded by 

E [ \{h e Ho I Ph < ar/m}\ } /C(r) < ar/C(r) , 

so that the desired FDR control would be attained if r < C{r) = \L\{r)\, 
that is, when L&(r) satisfies the self-consistency condition. This reasoning is, of 
course, unrigorous since L/±(r) is in fact a random variable (and we need other 
arguments to correctly prove the FDR control, e.g. Lemma 3.2). This point of 
view is in the same spirit as the post-hoc interpretation of the classical linear 
step-up procedure proposed in Section 3.3 of Bcnjamini and Hochberg (1995), 
where the authors remarked that the linear step-up procedure maximizes the 
number of rejected hypotheses under the above constraint, which is the property 
we used in Definition 3.1. 

As mentioned in the introduction and in Section 4.5, the forthcoming paper 
of Finncr et al. (2008) introduces a condition quite similar to the self-consistency 
condition (although formulated differently). Precisely, condition (T2) of 
Finner et al. (2008) can be seen to be equivalent to R = La(\R\) in our nota- 
tion (in the specific case of a linear threshold collection A and for the standard 
A-7r weighting). It is proved in Theorem 4.1 of Finner et al. (2008) that (T2) 
implies FDR control in the PRDS case (or more precisely, when (7) holds). The 
authors note that the corresponding proof unifies and simplifies classical results 
and proofs. The present work, developed independently, led to a very similar 
conclusion. In particular, Finner et al. (2008) note that their result covers in 
general the step- up-down procedures satisfying (T2), which is essentially the 
same as the first point of the present Theorem 4.5 (for the standard A and 
7T- weighting) . 

As an additional contribution, we introduced the "abstract" dependency con- 
dition DC, which allowed us to increase the range where the self-consistency 
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condition can be used, in particular when the p- values have unspecified depen- 
dencies. We also included A and 7r-weighting in our results; the formulation we 
adopted allows in particular for an easy extension to infinite, possibly continuous 
hypothesis spaces. Other original applications were exposed in Section 4. 

Conversely, Finncr et al. (2008) used their approach for different applications 
of interest, based on an asymptotically optimal rejection curve. Several step-up 
or step-up-down procedures are proposed by Firmer et al. (2008) based on vari- 
ations on this rejection curve and shown to have a an asymptotic and adaptive 
(in the sense of Section 4.3) control of the FDR (related to this is also the step- 
down procedure of Gavrilov et al. (2008), based on the same curve and shown 
to enjoy non-asymptotic control of the FDR). These results do not fit directly 
into the framework delineated in the present paper, but some of the technical 
tools used in their proof are of a similar spirit. A full technical development 
on this topic is out of the scope of the present work, but we demonstrate in a 
separate work (Blanchard and Roquain, 2008a) that the two conditions we pre- 
sented here (along with some additional key ideas coming from Benjamini et al., 
2006) can be used to prove (non-asymptotic) FDR control under independence, 
for an adaptive procedure based on a rejection curve analogous to that consid- 
ered by Finner et al. (2008) and Gavrilov et al. (2008). To this regard, let us 
also mention the recent work of Neuvial (2008), which compares a number of 
these related procedures in terms of their asymptotical power. 

Finally, we mention that the self-consistency condition presented here has 
a slightly weaker form than condition (T2) of Finncr et al. (2008), namely it 
is R C La{\R\) instead of R = La{\R\)- From a technical point of view, we 
note here that the argument of Finner et al. (2008) can actually be adapted 
straightforwardly to accomodate the weaker condition. Is the weaker form of 
the condition of interest at all? While the stricter condition is sufficient to 
cover the case of step-up and step-up-down procedures, in the present work we 
have also tried to demonstrate that the weaker form is not purely anectodical 
but useful in some other applications: first for truncated threshold collections 
(proof of Lemma 4.3), and secondly in Section 4.5 where we mentioned plausible 
practical scenarios where equality might not hold due to additional constraints. 

5.2. Conclusion 

The approach advocated in this paper to establish FDR bounds introduced 
a clear distinction between two sufficient conditions of a different nature: on 
the one hand, the self-consistency condition, which is purely algorithmic, and 
on the other hand, the (essentially probabilistic) dependency control condition. 
The two conditions are effectively coupled via the common choice of the shape 
function (3 appearing in both. The fundamental result of this paper is that these 
two conditions suffice for FDR control, but part of our message is that this point 
of view also introduced some relevant technical tools, which, abstracting some 
key arguments present in previous works, can be of use in various other settings. 

While these conditions are only sufficient and hence certainly not universal, 
we illustrated their interest by recovering in Sections 4.1, 4.2 and 4.5 several 
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existing results of the FDR multiple testing literature in an unified way, as in 
particular with any arbitrary combination of the following factors: 

• arbitrary A-weighting of the FDR via the volume measure, 

• arbitrary 7r-weighting of the p-values via the weight function, 

• arbitrary choice of dependency setting: independent, PRDS or unspecified, 

• in the unspecified dependencies setting, arbitrary choice of the shape func- 
tion /3 satisfying (6). 

• in the procedure algorithm, arbitrary choice between "step-down" and 
"step-up" , "step-up-down" , and more generally arbitrary choice among 
the possible orders A in a "step-up-down" procedure. 

In the past literature, many results have been established for specific combina- 
tions of the above variations; here we were able to cover all of these at once, 
possibly covering combinations that had not been explicitly considered earlier 
(in particular, the fourth "factor" above seems to be new). Several other appli- 
cations were proposed. 

An interesting direction for future work is to try to "adapt" the choice of the 
weight function ir (and possibly also the distribution v in the case of unknown 
dependencies) depending on the observed data. Because these parameters have 
an crucial influence on power, doing so in a principled way might result in a 
substantial improvement. 

Appendix 



Appendix A: Proof of Lemma 3.2 



Part (i). We want to establish the following inequality: 



E 



1{U < cg(U)} 



9{U) 



< c, 



for U stochastically lower bounded by a uniform distribution and g nonincreas- 
ing. Let U — {u \ cg(u) > u} , u* — supW and C* = in£{g(u) \ u G U} . It is not 
difficult to check that u* < cC* (for instance take any nondecreasing sequence 
u n e U / u* , so that g(u n ) \ C* ). If C* = , then u* = and the result is 
trivial. Otherwise, we have 



E 



1{U < cg(U)} 

g(u) 



< 



¥{U e U) < ¥{U < u*) < u* < 



C 



c* 



c* - 



Part (ii). The proof uses a similar telescopic sum argument as developed 
by Bcnjamini and Yckuticli (2001) for proving FDR control of the linear step- 
up under the PRDS assumption; the goal of the lemma presented here is to 
isolate this argument in order to specifically concentrate on condition DC, and 
to extend it to arbitrary (non-discrete) variables . 
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We want to prove the inequality 

~1{U < cV} 



E 



V 



< c 



for U, V two nonncgative real variables such that U is stochastically lower 
bounded by a uniform distribution, and the conditional distribution of V given 
U < u is stochastically decreasing in u . Fix some e > and some p e (0, 1) 
and choose K large enough so that p K < e. Put v = and = p K+1 ~ l for 
1 < z < 2K + 1 . The following chain of inequalities holds: 



E 



1{U < cV} 

We 



< 'g 1 ¥(U<cv t ;Ve [Vi-uvj)) + _ 



< c 



i=l 
2K+1 

E 



Vi-i V £ 



F(U < cvi;V e [vi-uVi)) v t 



¥(U < cv,) 



Vi-i V e 



+ £ 



2AT+1 



<cp- x J2 P ( Ve h-uvi) \ U<cvi)+e 

i=l 
2K+1 

= cp- 1 ( P ( V < v i\ u < cv i) ~ p ( v < v i-i I u < cv i)) + e 



i=l 
2K+1 



^ C P~ X Yj ( P ^ y < V *\ U < cv i) - P ( V < v i~l I U < cw «-l)) + £ 
t=l 

< C/9 _1 + £ . 

We obtain the conclusion by letting p — > 1 , e — > and applying the monotone 
convergence theorem. 

Part (in). Rewriting for any z > 0, 1/z = / +o ° v~ 2 l{t> > z}Gfo , and using 
Fubini's theorem: 



E 



1{U < c/3(V)} 



V 



= E 


[I 







l l{v > V}1{U < c/3(V)}di 



< 



< c 



/ v~ 2 E [l{v > V}1{U < cf3(V)}] 
Jo 

r+oo 

/ v~ 2 P(U< c(3{v))dv 
Jo 

r+oo 

/ v- 2 (3(v)dv 
Jo 

I u I l{u < v}v~ 2 dvdv(u) = 

Ju>0 Jv>0 



dv 



c . 



□ 
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Appendix B: Proof of Theorem 4.6 



To establish the first assertion of the Theorem, remember we assume the p- values 
of Hi are PRDSS on Ho , and the threshold collection is A(i) = am/ (m—i + 1) 2 . 
Assume too > (otherwise the result is trivial) and consider p^ < p( 2 ) < • • • < 
pt m ) the ordered p- values of (ph,h £ H). Denote by jo the (data-dependent) 
smallest integer j > 1 for which p(j\ corresponds to a true null hypothesis. 
Denote by Ri the step-down procedure of threshold collection A and restricted 
to the set of the false null hypotheses Hi . First note that the following points 
hold: 



(i) \RC\Ho\ >0^p Uo 
(ii) \Rr\H \ >0^j Q - 
(iii) RiCRnHi. 



^ UJfl 

- (m-j + l)2 

i < \Ri\ 



To prove this, suppose that |i2nWo| > 0, so that the null hypothesis correspond- 
ing to P(j ) is rejected by R. Hence, from the definition of a step-down procedure 
we have P(j ) < AO'o) and (i) holds. Moreover, since for all j < jo — 1, we have 
P(j-\ < A(j) and pij\ corresponds to a false null hypothesis, R\ necessarily re- 
jects all the null hypotheses corresponding to P(j),j < jo — 1, and we get (ii) . 
Finally, we obviously have Ri C Hi and it is easy to check that Ri C R (using 
the fact that the reordered p- values of Hi form a subsequence of (p(j))). 
From (i) and (ii) we deduce that 



\RDH \>0^3heHo ■ p h < 



< 



am 



(to - |i?!|) 2 " mo(m-\Ri\) 



(9) 



Therefore, 



FDR(_R) = E 



E 



1-RnWol 



\R\ 



1{\RDH \ > 0} 



\RnHo\ 



< E 



\RDHo\- 
m 



-1{\RDH \ > 0} 



to 



< 



heHo 



\RC\Hi 
mo 



RdHi 
l{\RnH \ > 0} 



toq 



\Ri 



■l{Ph < (am/m,o)(m - \Rx\) x } 



where for the first inequality, we used that fact that for each fixed a > 0, x <— > 
is a nondecreasing function on K + \{0} . For the second inequality, we used 

simultaneously (9) and the point (Hi) above. Since the function x i— > 

is log-convex on [0,toi] and takes values 1 in x — and x — mi, we have 



pointwise 



j+|Ri| ' 



jj^t < 1 . Therefore, we get 
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FDR(iJ) < 



rn 



1 



heHo 



MPh < {am/m Q )(m - {R^) 1 } 



1 



> am/rriQ 

heHo 



< 



a . 



m 



In the last inequality, we used that the couple (jph, {m — satisfies con- 

dition DC(/3) with c = am/mo and j3(x) — x; this holds in the present case 
from part (ii) of Lemma 3.2 because for any v > 0, D = {z € [0, | (m — 
|i?i(z)|) _1 < v} is a nondecreasing set (so that we can apply the same reasoning 
as for the proof of Proposition 3.6). 

For the second part of the theorem, we follow exactly the same proof as 
above with the modified threshold function and part (iii) of Lemma 3.2 instead 
of part (ii). □ 
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